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DETAILED ACTION 

1 . Claims 1-37 have been examined. 

Acknowledgment of papers filed: oath, specification, and drawings on July 31 , 
2003. The papers filed have been placed on record. 

Specification 

2. The disclosure is objected to because of the following informalities: In paragraph 
29, line 7, "200. 300" should read "100. 200". 

Appropriate correction is required. 

Claim Objections 

3. Claims 29 and 30 are objected to because they recite the limitation of "the article 
of claim 24". However, claim 24 claims a method. For the purpose of examination, the 
examiner assumes claims 29 and 30 are dependant on claim 25. 

Claim Rejections - 35 USC § 103 



4. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 
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5. Claims 1- 3. 5-9, 11-13. 15, 18-22, and 25- 29 are rejected under 35 U.S.C. 
103(a) as being unpatentable over Damron (U.S. Patent Application Publication No. US 
2004/0148491 A1) in view of Jamil (U.S. Patent Application Publication No. US 
2003/0126365 A1). 

6. Referring to claim 1 . Damron discloses an apparatus comprising: a first 
processor (processor 102, see fig 1) to execute a main thread instruction stream (see 
paragraph 28, lines 3-4 regarding the main processor 102 executing a main thread) that 
includes a delinquent instruction (any load which is known not to hit); a second 
processor (processor 104, see fig. 1) to execute a helper thread instruction stream (see 
paragraph 28, lines 4-6 regarding processor 104 executing a scout thread) that includes 
a subset of the main thread instruction stream (see paragraph 26, lines 1-4), wherein 
the subset includes the delinquent instruction (see paragraph 61, lines 1-3 & last 7 
lines); wherein said first and second processors each include a private data cache (data 
cache 222; see figs. 2 and 3; paragraph 36, lines 11-12); a shared memory system 
(shared cache [106 & 224] and shared main memory [108 & 207], see figs. 1-3) coupled 
to said first processor and to said second processor (see paragraph 36, lines 11-12) ; 
and logic to retrieve, responsive to a miss of requested data (any data not in private 
data cache 222) for the delinquent instruction (instruction referencing data not in cache) 
in the private cache of the second processor (paragraph 26, 5-8; "warm-up" implies that 
a scout thread runs into data that are not in the cache), the requested data from the 
shared memory system (see paragraph 26, lines 5-8; "warm-up" further implies that the 
data are loaded from main memory); the logic further to provide requested data to the 
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first processor (see paragraph 26, lines 6-9 regarding warming up the shared cache to 
provide the main processor data used by the scout processor). 

Damron does not expressly disclose that the logic further to provide the 
requested data to the private data cache of the first processor. 

Jamil teaches that the logic is further to provide the requested data to the private 
data cache of the first processor (paragraph 4, lines 18-21). 

For this modification to be successful, instead of writing data to the shared cache 
to be read by another processor (Damron, paragraph 26, lines 6-9), it would be written 
to the private cache of that processor (Jamil, paragraph 4, lines 18-21). In this instance, 
when the scout thread is prefetching data, it would prefetch the data by reading the data 
into its private cache to be used, and then transfer it to the private cache of the primary 
processor. 

It would have been obvious for one of ordinary skill in the art at the time of the 
invention to have modified the invention of Damron by making the logic further to 
provide the requested data to the private data cache of the first processor as taught by 
Jamil in order to decrease the access time of data required by the primary processor 
because communication with on-chip caches or caches of the same level, is faster than 
communicating through the use of an external shared cache (Jamil, paragraph 5). 

7. Regarding claim 2, Damron/Jamil discloses the apparatus of claim 1, wherein: 
the first processor, second processor and logic are included within a chip package (see 
Damron, paragraph 33, lines 1-4). 



Application/Control Number: 10/632,431 Page 5 

Art Unit: 2181 

8. Regarding claim 3, Damron/Jamil discloses the apparatus of claim 1 , wherein: 
the shared memory system includes a shared cache (see Damron, fig. 1, ref. 106; 
paragraph 32, lines 1-2). 

9. Regarding claim 5, Damron/Jamil discloses the apparatus of claim 3, wherein: 
the shared cache is included within a chip package (see Damron, paragraph 33, lines 4- 
6). 

10. Regarding claims 6. 20, and 27 Damron/Jamil discloses the apparatus of claim 1 , 
the method of claim 18 and the apparatus of 25. 

Damron does not expressly disclose that the logic is further to provide the 
requested data from the shared memory system to the private data cache of the second 
processor. 

Jamil teaches logic providing requested data from the shared memory system to 
the private data cache of the second processor (see paragraph 23, lines 2-7). 

It would have been obvious for one of ordinary skill in the art at the time of the 
invention to have modified the combined invention of Damron/Jamil (see above 
regarding claim 1) by providing requested data from the shared memory system to the 
private data cache of the second processor as taught by Jamil in order to decrease 
access time for the second processor by pulling data into the private cache. 
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1 1 . Regarding claim 7, Damron/Jamil discloses the apparatus of claim 1 . wherein: 
said first and second processors are included in a plurality of n processors, each of said 
plurality of processors is coupled to the shared memory system (Damron, fig. 1, ref. 106 
& 108; paragraph 32); and each of said n plurality of processors includes a private data 
cache (Damron, fig. 3. ref. 222; paragraph 36, lines 11-12). 

Damron/Jamil does not expressly disclose that n>2. 

Jamil teaches that n>2 (n=4 processors, paragraph 19, lines 1-4). 

The combination would be successful if two sets of processors were used; 
wherein 2 of the processors are main processors, and 2 of the processors are helper 
processors (one helper processor for each main processor). The pair would act much 
like having only a main processor and helper processor wherein each set of processors 
would run on one thread. 

It would have been obvious for one of ordinary skill in the art at the time of the 
invention to have modified the combined invention of Damron/Jamil (see above 
regarding claim 1) by using more than 2 processors as taught by Jamil in order to allow 
for more threads to be executed simultaneously and would therefore increase 
performance and throughput of the processing system. 

12. Regarding claim 8, Damron/Jamil discloses the apparatus of claim 7. 
Damron does not expressly disclose that the logic is further to provide the 

requested data from the shared memory system to each of the n private data caches. 
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Jamil teaches that the logic is further to provide the requested data from the 
shared memory system to each of the n private data caches (see paragraph 23, lines 2- 

7). 

The combination would be successful if when there is a cache miss occurs in the 
second processor the requested data would be loaded into all private data caches. 
When the invention of Damron fetches data into cache, the data is loaded into the 
shared cache, which is accessible by all processors. The combination would therefore 
have to make the data accessible to all processors and must do that by transferring the 
data to each of the private data caches. 

It would have been obvious for one of ordinary skill in the art at the time of the 
invention to have modified the combined invention of Damron/Jamil (see above 
regarding claim 7) by making the logic provide the requested data from the shared 
memory system to each of the n private data caches as taught by Jamil in order to 
decrease access time for data needed by the processors (see above regarding claim 1 ) 

13. Regarding claim 9. Damron/Jamil discloses the apparatus of claim 7, wherein: 
the logic is further to provide the requested data from the shared memory system to a 
subset of the n private data caches, the subset including x (1; first processor; see above 
regarding claim 1) of the n (2; first processor and second processor; see above 
regarding claim 1) private data caches, where 0<x<n (0 <1< 2). 

Note that if the logic provides the requested data to the private cache of the first 
processor, it would have provided the data tox (1) private data cache. Further note that 
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the subset can include all processors because The American Heritage College 
Dictionary defines subset as "A set contained within a set". A set can be contained 
within itself. 

14. Claim 1 1 recites equivalent limitations as set forth in claim in claim 1 and is 
therefore rejected using the same grounds as claim 1. 

15. Regarding claim 12, Damron/Jamil discloses the apparatus of claim 1 1 , further 
comprising: a shared memory system coupled to said first processor and to said second 
processor (Damron. fig, 1 . ref. 106 & 108; paragraph 32); wherein said logic is further to 
retrieve the requested data from the shared memory system if the requested data is not 
available in the other private data cache (see above regarding claim 6). 

Note that the limitation" wherein said logic is further... other private data cache" 
is equivalent to the limitation of claim 6 and is rejected on the same grounds. 

16. Regarding claim 13, Damron/Jamil discloses the apparatus of claim 1 1 . 
Damron does not expressly disclose that the logic is included within an 

interconnect, wherein the interconnect is to provide networking logic for communication 
among the first processor, the second processor, and the shared memory system. 

Jamil teaches that the logic is included within an interconnect (refs. 151-156, 
130, see fig. 1), wherein the interconnect is to provide networking logic for 
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communication among the first processor, the second processor (see paragraph 23, 
lines 4-7). and the shared memory system (see paragraph 23, lines 6-7). 

It would have been obvious for one of ordinary skill in the art at the time of the 
invention to have modified the invention of Damron by including logic within an 
interconnect, wherein the interconnect is to provide networking logic for communication 
among the first processor, the second processor, and the shared memory system in 
order to maintain cache coherency between caches without routing data off-chip when 
storing data in private caches (see Jamil, abstract). This increases memory throughput 
between processors (see Jamil, paragraph 5). 

17. Regarding claim 15, Damron/Jamil discloses the apparatus of claim 11, wherein: 
the memory system includes a shared cache (Damron, fig. 1, ref. 106; paragraph 32, 
lines 1-2). 

18. Regarding claims 18 and 25, Damron/Jamil discloses a method and article 
comprising: determining that a helper core has suffered a miss in a private cache for a 
load instruction (Damron paragraph 60, lines 8-12) while executing a helper thread; and 
prefetching load data for the load instruction into a private cache of a main core (see 
above regarding claim 1). 

Note that a processor must execute a program in order to miss data in a caclie. 
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19. Regarding claims 19 and 26, Damron/Jamil discloses the method of claim 18 and 
article of claim 25, wherein prefetching further comprises: retrieving the load data from a 
shared memory system; and providing the load data to the private cache of the main 
core (see above regarding claim 1). 

20. Regarding claims 21 and 28, Damron/Jamil discloses the method of claim 18 and 
article of claim 25, further comprising: providing load data for the load instruction from a 
shared memory system (main memory and shared cache) into the private cache (see 
above regarding claim 8) for each of a plurality of helper cores (helper processors; see 
above regarding claim 7). 

21 . Regarding claims 22 and 29, Damron/Jamil discloses the method of claim 18 and 
article of claim 25, wherein prefetching further comprises: retrieving the load data from a 
private cache of a helper core; and providing the load data to the private cache of the 
main core (see above regarding claim 8; if data is sent to all caches, it will be sent to the 
cache of the main core). 

Claim 31 recites equivalent limitations as stated in claim 12 and is therefore rejected 
using the same grounds. 

22. Claims 4, 16, and 32-37 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Damron (U.S. Patent Application Publication No. US 2004/0148491 
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A1) in view of Jamil (U.S. Patent Application Publication No. US 2003/0126365 A1) and 
Jeddeloh (U.S. Patent No. US 6.789. 168 B2). 

Regarding claims 4 and 16, Damron/Jamil disclose the apparatus of claim 3 and 
claim 15. 

Damron/Jamil does not expressly disclose that the shared memory system 
includes a second shared cache. 

Jeddeloh teaches that the shared memory system includes a second shared 
cache (col. 3, lines 66-67 & col. 4, lines 1-2). 

The invention of Damron would have been modified by adding L3 cache 
implemented in the chipset of the computer in addition to the L2 cache. 

It would have been obvious for one of ordinary skill in the art at the time of the 
invention to have modified the combined invention of Damron/Jamil because the use of 
L3 cache increases the overall size of the cache making memory accesses less 
frequent and therefore increasing overall system bandwidth. 

23. Regarding claim 32, Damron discloses a system comprising: a memory system 
(main memory 108 and shared cache 106; see fig. 1; paragraph 28, lines 6-9); a first 
processor (main processor 102; see fig. 1), coupled to the memory system, to execute a 
first instruction stream (see paragraph 28, lines 3-4); a second processor (scout thread 
processor 104; see fig. 1), coupled to the memory system, to concurrently execute a 
second instruction stream (see paragraph 28, lines 4-6). 
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Damron does not expressly disclose helper threading logic to provide fill data 
prefetched by the second processor to the first processor. 

Jamil teaches helper threading logic (fig. 1, refs. 151-156, 130) to provide fill data 
prefetched by the second processor to the first processor (Jamil paragraph 23, lines 4- 
5; see above regarding claim 1 ). 

It would have been obvious for one of ordinary skill in the art at the time of the 
invention to have modified the invention of Damron by adding helper threading logic to 
provide fill data prefetched by the second processor to the first processor as taught by 
Jamil in order to decrease cache access time for the main processor (see above 
regarding claim 1 ). 

Further, Damron/Jamil does not expressly disclose that the memory system 
includes a dynamic random access memory. 

Jeddeloh teaches a memory system that includes a dynamic random access 
memory (see paragraph 1). 

It would have been obvious at the time of the invention for one of ordinary skill in 
the art to have modified the combined invention of Damron/Jamil by using a memory 
system that includes a dynamic random access memory as taught by Jeddeloh in order 
to decreases the physical size of the cache as compared to SRAM (see Jeddeloh col. 4, 
lines 52-54). 

24. Regarding claim 33, Damron/Jamil/Jeddeloh discloses the system of claim 32, 
wherein: the helper threading logic is further to push the fill data to the first processor 
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before the fill data is requested by an Instruction of the first instruction stream (see 
above regarding claim 1 ). 

Note that Damron updates the shared memory (the memory being accessed by 
the main processor) as soon as the scout processor receives it Using the cache setup 
of Jamil, the memory that would be updated would be the private data of the main 
processor which would be done at the time the data is reached in the scout thread 
ahead of the main thread. 

25. Claim 34 recites an equivalent limitation as set forth in claim 22 and is therefore 
rejected using the same grounds. 

26. Regarding claim 35, Damron/Jamil/Jeddeloh discloses the system of claim 32, 
wherein: the helper threading logic is further to provide the fill data to the first processor 
from the memory system (see above regarding claim 1 ). 

Note that the fill data comes from the shared memory indirectly through the 
cache of the second processor. 

27. Regarding claim 36, Damron/Jamil/Jeddeloh discloses the system of claim 32, 
further comprising: an interconnect (Jamil, fig. 1, refs. 151-156, 130) that manages 
communication between the first and second processors (Jamil paragraph 39. lines 8- 
11). 
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Regarding claim 37, Damron/Jamil/Jeddeloh discloses the system of claim 32, wherein: 
the memory system includes a cache that is shared by the first and second processors 
(Damron, fig. 1, ref. 106; paragraph 32. lines 1-2). 

28. Claims 10 and 17 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Damron (U.S. Patent Application Publication No. US 2004/0148491 A1) in view of 
Jamil (U.S. Patent Application Publication No. US 2003/0126365 A1) and Luk (U.S. 
Patent Application Publication No. US 2002/0055964 A1). 

29. Regarding claim 10, Damron/Jamil disclose the apparatus of claim 1 . 
Damron/Jamil do not expressly disclose that the first processor is further to 

trigger the second processor's execution of the helper thread instruction stream 
responsive to a trigger instruction in the main thread instruction stream. 

Luk teaches the use of a trigger instruction to use in a main thread to start a 
helper thread (paragraph 8-9). 

It would have been obvious to one of ordinary skill in the art at the time of the 
invention to modify the combined invention of Damron/Jamil by including an instruction 
in the main instruction stream to start execution for the helper thread as taught by Luk in 
order to use hardware to prefetch in situations where prefetching will help the current 
thread and being able to stop the pre-execution thread if it will not help and another 
thread needs to use the hardware. 
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30. Claim 17 recites equivalent limitations as stated in claim 10 and is therefore 
rejected using the same grounds. 

31. Claims 14, 23, 24, and 30 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Damron (U.S. Patent Application Publication No. US 2004/0148491 
A1) in view of Jamil (U.S. Patent Application Publication No. US 2003/0126365 A1) and 
Dhong (U.S. Patent No. 6.138. 208). 

Regarding claim 14, Damron/Jamil discloses the apparatus of claim 13, wherein: 
the first and second processor are each included in a plurality of n processors (n = 2; 
only the first and second processors); and the interconnect is further to broadcast a 
request for the requested data to each of the n processors and to the shared memory 
system (Jamil, paragraph 24, lines 11-18). 

Damron/Jamil do not expressly disclose that the requests are done concurrently. 

Dhong teaches a method for concurrently requesting data from two levels of 
cache (col. 4, lines 35-43). 

It would have been obvious for one of ordinary skill in the art at the time of the 
invention to modify the combined invention of Damron/Jamil (see above regarding claim 
1) to concurrently request data in the private data caches of private processors (LI 
cache) and the shared data cache (L2 cache) as taught by Dhong in order to decrease 
the access time for the higher level of cache by overlapping LI and L2 cache accesses 
(Dhong, col. 4, lines 40-43). 
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32. Claim 23 recites equivalent limitations as stated in claim 14 and is therefore 
rejected using the same grounds. 

33. Claim 24 recites equivalent limitations as stated in claim 12 and is therefore 
rejected using the same grounds. 

34. Claim 30 recites equivalent limitations as set forth in claim 14 and is therefore 
rejected using the same grounds 

Conclusion 

35. The following is text cited from 37 CFR 1 .1 1(c): In amending in reply to a 
rejection of claims in an application or patent under reexamination, the applicant or 
patent owner must clearly point out the patentable novelty which he or she thinks the 
claims present in view of the state of the art disclosed by the references cited or the 
objections made. The applicant or patent owner must also show how the amendments 
avoid such references or objections. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Jesse R. Moll whose telephone number is (571)272- 
2703. The examiner can normally be reached on M-F 8:00 am - 4:30 pm. 
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if attempts to reach the examiner by teiephone are unsuccessfui, the examiner's 
supervisor, Kim Huynh can be reached on (571)272-4147. The fax phone number for 
the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 




